fix: updating `SafeToRetry()` function to retry on wrapped errors #141

tjasko · 2024-04-24T20:57:40Z

There is a rather unique bug when a single connection is being slammed with queries. When statement caching is enabled, and the deallocateInvalidatedCachedStatements() method is called to clean up, the Pipeline.Sync() method has the ability to error out due to the connection being in-use.

connLockError already implements the inexplicit SafeToRetry() bool() interface, which is then checked by the SafeToRetry() function. In the event the connection is locked due to being in use, the deallocateInvalidatedCachedStatements() method can return a wrapped error that looks like:

failed to deallocate cached statement(s): conn busy

When the stdlib PGX wrapper is used, it is possible for this to not be retried, as it is also taking advantage of this SafeToRetry() function. In order to correct this and prevent future similar errors, the SafeToRetry() function has been updated to check against the wrapped errors as well.

If approved, the change should be made in PGX as well:
https://github.com/jackc/pgx/blob/v5.5.5/pgconn/errors.go#L13-L19

There is a rather unique bug when a single connection is being slammed with queries. When statement caching is enabled, and the `deallocateInvalidatedCachedStatements()` method is called to clean up, the `Pipeline.Sync()` method has the ability to error out due to the connection being in-use. `connLockError` already implements the inexplicit `SafeToRetry() bool()` interface, which is then checked by the `SafeToRetry()` function. In the event the connection is locked due to being in use, the `deallocateInvalidatedCachedStatements()` method can return a wrapped error that looks like: ``` failed to deallocate cached statement(s): conn busy ``` When the `stdlib` PGX wrapper is used, it is possible for this to not be retried, as it is also taking advantage of this `SafeToRetry()` function. In order to correct this and prevent future similar errors, the `SafeToRetry()` function has been updated to check against the wrapped errors as well.

jackc · 2024-05-08T14:33:17Z

Regarding failed to deallocate cached statement(s): conn busy:

conn busy should never occur unless there is a bug in the application code or pgx itself. If this is a pgx bug, then that should be fixed instead of essentially papering over the underlying issue.

As far as the actual change goes... Hmmm... Errors are considered unsafe to retry if they do not define a SafeToRetry method. Considering an unsafe error safe because it wraps a safe error seems a little strange. But on the other hand it's difficult to think of a case where that wouldn't be correct. Maybe something with batches or pipeline mode. 🤔

tjasko · 2024-05-08T16:50:42Z

You know what Jack, I was writing up a long response, but after some rubber duck debugging myself, I realized there was an issue within the application where an existing mutex wasn't blocking all read/write DB operations. That's probably the cause of how I was running into this, derp.

In regard to this change, I'll defer to whatever you think is best. To be honest, I did find it a little odd that this was the only place (that I could find) where a retriable error was being wrapped. Per standard Golang conventions, one would typically assume an error wrapped with a SafeToRetry() method is retriable (hence for the reason of this PR), but if that's not the case we can just close out this PR. I would be a little worried about future contributors wrapping this error, and accidentally introducing a bug when the error should be retired... so perhaps that's a valid reason to merge this in. On the other hand, this could be a documentation problem and we'd just need to explicitly document that these errors shouldn't be wrapped.

Thanks for the assistance here!

jackc · 2024-05-09T22:57:31Z

On further thought, I think this is a correct change.

jackc · 2024-05-09T23:01:40Z

I've ported the change to v5 as well.

tjasko · 2024-05-09T23:03:45Z

Nice, thanks for your help!

jackc merged commit 4e2e7a0 into jackc:master May 9, 2024
12 checks passed

tjasko deleted the fix/retry-with-wrapped-errs branch May 9, 2024 23:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: updating `SafeToRetry()` function to retry on wrapped errors #141

fix: updating `SafeToRetry()` function to retry on wrapped errors #141

tjasko commented Apr 24, 2024 •

edited

Loading

jackc commented May 8, 2024

tjasko commented May 8, 2024

jackc commented May 9, 2024

jackc commented May 9, 2024

tjasko commented May 9, 2024

fix: updating SafeToRetry() function to retry on wrapped errors #141

fix: updating SafeToRetry() function to retry on wrapped errors #141

Conversation

tjasko commented Apr 24, 2024 • edited Loading

jackc commented May 8, 2024

tjasko commented May 8, 2024

jackc commented May 9, 2024

jackc commented May 9, 2024

tjasko commented May 9, 2024

fix: updating `SafeToRetry()` function to retry on wrapped errors #141

fix: updating `SafeToRetry()` function to retry on wrapped errors #141

tjasko commented Apr 24, 2024 •

edited

Loading